Audiovisual data decoding method

ABSTRACT

The present invention relates to a method of decoding audio-visual data allowing to process, on the basis of an improved syntactic language, distinct elements of a scene as objects for which individual animations, particular interactions user/elements, and specific relations between the elements and the defined animations and/or interactions can be provided, the description being organized in a hierarchical tree also including transversal connections provided both for embedding bidimensional and/or tridimensional objects in each other and optionally controlling the rendering of scenes from various view points, while maintaining a control of all related actions both in the embedded objects and/or scenes and in the original ones.

BACKGROUND OF THE INVENTION

1. Field of the Invention

The present invention relates to a method of decoding coded digitalsignals representative of audiovisual data and available in the form ofa continuous bitstream, in view of the binary description of a scene tobe rendered on a displaying device, said method comprising a processingoperation based on an evolutive syntactic language and provided forextracting from said bitstream, in a first step, distinct elementscalled objects according to the structure of said scene, defining, in asecond step, an individual animation of said elements of the scene,defining, in a third step, particular interactions between a user andsaid elements, and organizing, in a fourth step, specific relationsbetween said scene elements and corresponding individual animationsand/or user interactions according to various classes of applications.This invention will be mainly used in the future MPEG-4 decoders.

2. Description of the Related Art

The most important goal of the well-known MPEG-1 and MPEG-2 standards,dealing with frame-based video and audio, was to make storage andtransmission more efficient by compressing the concerned data. Thefuture new MPEG-4 decoding standard will be fundamentally different, asit will represent the audiovisual scenes as a composition of objectsrather than pixels only. Each scene is defined as a coded representationof audiovisual objects that have given relations in space and time,whatever the manner in which said given scene has been previouslyorganized in these objects (or segmented).

Up to now, the standardization bodies dealing with natural and syntheticsources used to be different. As good three-dimensional (3D)capabilities are becoming an increasingly important part of many fields,including multimedia and World Wide Web applications that use VRML(VRML—or Virtual-Reality Modelling Language—is now the standard forspecifying and delivering 3D-graphics-based interactive virtualenvironments), the MPEG-4 standard considers jointly the naturalmaterials (video, audio, speech) and the synthetic ones (2D and 3Dgraphics and synthetic sound) and tries to combine them in astandardized bitstream, in view of the presentation of such a multimediacontent on a terminal screen. In order to compose this audiovisualinformation within the scene, their spatio-temporal relationship needsto be transmitted to the terminal.

The MPEG-4 standard defines a syntactic description language to describethe binary syntax of an audiovisual object's bitstream representation aswell as that of the scene description information. More precisely, theMPEG-4 system Verification Model 4.0 proposes, for the description ofthe scenes, a binary format called the Binary Format for Scenes (BIFS).This description, constructed as a coded hierarchy of nodes withattributes and other information such as event sources and targets, isbased on the assumption that the scene structure is transmitted as aparametric description (or a script) rather than as a computer program.The scene description can then evolve over time by using coded scenedescription updates. The node descriptions, which are conveyed in a BIFSsyntax, may also be represented, for the purpose of clarity, in atextual form. Some MPEG-4 nodes and concepts are direct analogues of theVRML 2.0 nodes. Others are modified VRML 2.0, still others are added forspecific NPEG-4 requirements. Like the VRML 2.0 syntax, the BIFS hasprovisions for describing simple behaviors and interaction with the userthrough an event passing mechanism. However some problems, explainedhereunder, are not solved by this format.

The first of these addressed problems concerns an unified description ofa mixed 2D and 3D scene. There is indeed a fundamental differencebetween the description of a purely 3D scene, the description of apurely 2D scene, and the description of a mixel 2D/3D scene. In a 3Dscene, the layering of the objects is based on the depth information. In2D, the notion of depth is absent and the layering should be definedexplicitly. Furthermore, mixing 2D and 3D objects may be accomplished inseveral ways:

(1) embedding of 3D objects in a 2D scene:

(a) this is, for example, the case when one tries to render 3D objectsin front of a 2D background: in this case, when the user navigates inthe scene, the background does not move;

(b) another example is an application in which the user interfacecontains 2D objects (such as, buttons or text) and a 3D viewer where thescene is rendered;

(2) embedding of 2D objects in a 3D scene:

(a), this is for example, the case when one uses a video object as atexture map on 3D objects;

(b) another example is a texture made of 2D graphic objects (a specialcase of this is an “active map”, that is a 2D plane in 3D scene made ofseveral composited 2D objects);

(3) these two schemes may be mixed recursively, for example, forembedding 3D objects in a 2D scene and using the resulting compositionas a texture map on 3D objects (this may be used to simulate thereflection of a mirror);

(4) a last possibility is to view simultaneously, the same 3D scene fromdifferent view points.

At that moment, it is not possible to describe all these possibilitiesusing a single scene graph. A scene graph is a tree that represents ascene by means of a hierarchy of objects called nodes. The scene iscomposed of grouping nodes and children nodes. The role of groupingnodes is to define the hierarchy and the spatial organization of thescene. Children nodes are the leaves of the tree. These nodes are usedto define geometric objects, light sources as well as various types ofsensors (objects that are sensitive to user interaction). Grouping nodeshave children nodes. These children may be children nodes or othergrouping nodes.

All nodes may have attributes which are called fields. The fields may beof any type. For example, sphere is a geometry node. It has a field thatdefines its radius. It is a single value field of type float (SFFloat).Children nodes of a grouping node are specified in a special field. Thisfield is a multiple value field (a list of nodes), and each value is oftype node (MFNode).

Now, in order to define animations and user interaction in the scene, itis possible to make connection between fields using an event passingmechanism called routing. Routing a field A to a field B means thatwhenever field A changes, field B will take the same value as field A.Only fields of the same type (or the same kind) may be connected. Fieldsmay be specialized: some may only be the destination of a route, theyare called eventln, others may only be at the origin of a route, theyare called eventOut, others may be both the origin and destination ofroutes, they are called exposedField and, at last, others may not beconnected, they are simply called field.

In VRML, four nodes (Viewpoint, Background, Fog and NavigationInfo) playa special role in the sense that only one of each may be active at agiven time. These nodes are said to be bindable nodes.

There are many reasons to try to integrate both 2D and 3D features inone coherent framework:

it is possible to use the same event passing mechanism for the whole2D/3D scene;

the representation of content can be more compact;

the implementation can be optimized because 2D and 3D specificationshave been designed to work together.

In order to fulfill these requirements, one needs to be able to compose,in a 2D space, 2D and 3D layers representing the result of the renderingof a 2D or a 3D scene, as well as using the result of rendering of a 2Dor 3D scene as an input to other nodes in the scene graph.

Other problems, not still solved, have also to be considered, especiallythe following ones:

(1) interactivity with the 2D objects: it may be necessary to be able tointeract with the objects, change the layering, add or remove objects,which is not possible without a method to set the depth of a 2D objectthat is compatible with the event passing mechanism of VRML 2.0;

(2) single event routing mechanism, in order to be provided withinteractivity and simple behavior capabilities: an example of this couldbe the display of a 2D map in a walk through application, the map beingused to navigate, which requires the capacity to route a user triggeredevent from a 2D object (the map) to the 3D scene (the view point);

(3) global hierarchy of the scene: while a scene graph representationinvolves a hierarchical organization of the scene, 2D or 3D layersshould not be considered as other graphic objects, and mixed with theglobal scene graph (moreover, layers may be hierarchical, as illustratedfor instance in the layer graph of FIG. 1, explained later);

(4) interactivity with video objects: one of the features of MPEG-4video is an object level interaction, i.e., the description of video asa set of objects rather than a set of pixels, which allows theinteraction with the content of the video (such as cut and paste of anobject within a video) and needs to be defined for each application bythe content creator (said interaction, being not a feature of theterminal itself, may be described by means of BIFS, but, for this, thecomposition of the various video objects has to be described in the BIFSitself).

SUMMARY OF THE INVENTION

It is therefore an object of the invention to provide an enhancement ofthe BIFS in order to fully describe the composition of complex scenebuilt from both 2D and 3D objects. This enhancement allows a unifiedrepresentation of the complete scene and its layout, as well as eventpassing not only within the 3D scene (as in VRML 2.0) but also between2D and 3D nodes, and also allows the definition of specific userinterfaces that may be transmitted with the scene, rather than the useof a default user interface provided by the terminal.

To this end, the invention relates to a method as described in thepreamble of the description and which is further characterized in thatsaid processing operation also includes an additional step fordescribing a complex scene, built from any kind of bidimensional andtridimensional objects, according to a framework integrating bothbidimensional and tridimensional features and unifying the compositionand representation mechanisms of the scene structure.

More precisely, said framework may be characterized in that saidadditional description step comprises a first main sub-step for defininga hierarchical representation of said scene according to a treestructure organized both in grouping nodes, that indicate thehierarchical connections giving the spatial composition of the concernedscene, and in children nodes, that constitute the leaves of the tree,and a second auxiliary sub-step for defining, possible transversalconnections between any kind of nodes.

In an advantageous embodiment of the proposed method, the nodes of thetree structure comprise at least bidimensional and tridimensionalobjects, and the auxiliary definition sub-step comprises a firstoperation for embedding at least one of said bidimensional objectswithin at least one of said tridimensional objects, an optional secondoperation for defining transversal connections between saidtridimensional and bidimensional objects, and an optional thirdoperation for controlling the definition step of at least one individualanimation and/or at least one particular interaction both in theembedded bidimensional object(s) and in the corresponding originalone(s).

In another advantageous embodiment of the method, the nodes of the treestructure comprise at least bidimensional and tridimensional objects,and the auxiliary definition sub-step comprises a first operation forembedding at least one of said tridimensional objects within at leastone of said bidimensional objects, an optional second operation fordefining transversal connections between said bidimensional andtridimensional objects, and an optional third operation for controllingthe definition step of a least one individual animation and/or at leastone particular interaction both in the embedded tridimensional object(s)and in the corresponding original one(s).

In another advantageous embodiment of the method, the nodes of the treestructure comprise at least tridimensional objects, and the auxiliarydefinition sub-step comprises a first operation for embedding at leastone of said tridimensional objects within at least one of anyone of saidtridimensional objects, an optional second operation for definingtransversal connections between said tridimensional objects, and anoptional third operation for controlling the definition step of at leastone individual animation and/or at least one particular interaction bothin the embedded tridimensional object(s) and in the correspondingoriginal one(s).

Whatever these two last embodiments, it can be noted that said auxiliarydefinition sub-step may also comprise an additional operation forcontrolling the simultaneous rendering of at least one singletridimensional scene from various viewpoints while maintaining the thirdoperation for controlling the definition step of the individualanimation(s) and/or the particular interaction(s).

The invention relates not only to the previously described method, withor without the optional operations, but also to any signal obtained byimplementing such method in any one of its variants. It is clear, forinstance, that the invention relates to a signal obtained after havingextracted from the input bitstream, in a first step, distinct elementscalled objects according to the structure of a scene, defined, in asecond step, an individual animation of said elements of the scene,defined, in a third step particular interactions between a user and saidelements, organized, in a fourth step, specific relations between saidscene elements and corresponding individual animations and/or userinteractions according to various classes of applications, and carriedout an additional step for describing a complex scene, built from anykind of bidimensional and tridimensional objects, according to aframework integrating both bidimensional and tridimensional features andunifying the composition and representation mechanisms of the scenestructure.

Such a signal allows to describe, together, bidimensional andtridimensional objects, and to organize a hierarchical representation ofa scene according to a tree structure, itself organized in groupingnodes defining the hierarchical connections and in children nodes, saidnodes allowing to form together a single scene graph constituted of a 2Dscene graph, a 3D scene graph, a layers scene graph, and transversalconnections between nodes of this scene graph.

Such a signal also allows to define 2D or 3D scenes already composed orthat have to be composed on a screen, with a representation of theirdepth, or to define 3D scenes in which will be embedded other scenesalready composed of 2D or 3D objects, or also to define textures for 3Dobjects themselves composed of other 3D or 2D objects. In fact, such asignal allows to interact with any 2D or 3D object of the scene and toorganize any kind of transmission of data between all these objects ofthe scene. Obviously, the invention also relates to a storage medium formemorizing said signal, whatever its type or its composition. Finally,the invention also relates to a device for displaying or delivering inany other manner graphic scenes on the basis of signals such asdescribed above, in order to reconstruct any kind of scene includingbidimensional and tridimensional objects.

BRIEF DESCRIPTION OF THE DRAWING

The particularities and advantages of the invention will become moreapparent from the following description and the accompanying drawing, inwhich the sole FIG. 1 is a complete scene graph example.

DESCRIPTION OF THE PREFERRED EMBODIMENT

The scene graph of FIG. 1 shows a hierarchical representation of saidscene, according to a tree structure. This structure is a hierarchy oflayers that represent rectangular areas of the screen of a displayingdevice, and said hierarchy is organized in nodes (either in groupingnodes GN defining the hierarchical connections or in children nodes CNthat are the leaves of the tree), with, according to the invention,possible transversal connections between these nodes (in FIG. 1, forexample between the child node 3D Object-2 and the grouping node 2DScene- 1, for illustrating the situation where a 3D object includes a 2Dscene, or between the grouping nodes 3D Scene-2 and 3D Scene-1, forillustrating the situation where two “Layer3D” include the same 3D sceneseen from different viewpoints).

In said illustrated scene graph, three different scene graphs are infact provided: the 2D graphics scene graph, the 3D graphics scene graph,and the layers scene graphs. As shown in the picture, the 3D layer-2views the same scene as 3D-layer-1, but the viewpoint may be different.The 3D object-3 is an appearance node that uses the 2D-Scene 1 as atexture node.

The principle of the invention is to provide new nodes that unify thedescription of the 2D/3D composition as a single graph.

First two new nodes are defined in order to describe the hierarchy of 2Dand 3D layers. The 2D an 3D layers are composited as a hierarchical setof rendering areas that are 2D planes:

Layer2D: children nodes of layer 2D can be a Layer2D, Layer3D, and allnodes acceptable for a 2D scene description;

Layer3D: children nodes of layer 3D can be a 2D or a 3D Layer and ascenegraph describing a 3D scene.

Two new nodes are also defined in order to be able to use 2D and 3Dcomposited scenes as input for a texture in a 3D world, to be mapped ona 3D object:

Composite2DTexture: this is a texture map containing as children nodes a2D scene, and the composited 2D scene is used as the texture map;

Composite3DTexture: this is a texture map containing children nodesdefining a 3D scene. The composited 3D scene is used as the texture map.It is in particular possible to use this node to map the result of therendering of an existing 3D scene viewed from another view point. Thisnode is useful to simulate reflection effects for instance.

A useful special case of the above is when a composited 2D scene ismapped on a rectangle in the 3D space. This can be seen as an “activemap” inserted in the 3D space. Because the implementation of such a nodecan be very different from the implementation of the Composite Texture2Dnode, it is meaningful to design a specific node for this case. AnActiveMap node is thus proposed in the following of the description.

Finally, in order to route pre-defined values of the viewpoint or otherbindable children nodes to one of the above quoted nodes, a specificValuator node is defined. This node can be used in a broader scope inthe BIFS specification, or could be defined as a compliant VRML 2.0prototype.

The principle of the invention having been explained, definition andsemantic of these new nodes will now be more precisely indicated in thefollowing paragraphs (A) (F) to.

(A) Layer2D Definition and Semantic

The layer2D node is defined as a grouping node. It defines an area onthe screen where 2D objects will be rendered. Three fields (orattributes) describe how this node will be rendered with respect toother objects its size, its position and its depth. These fields may bethe origin or the destination of routes. They are thus exposedFields.This Layer2D node may be the parent of other nodes of the same type(i.e., also Layer2D) or of a similar type defined below (Layer3D). Thismay be described by a multiple value field of type node (MFNode).Besides, this node may be the parent of nodes representing 2D objects.This also may be described by a multiple value field of type node(NFNode).

In the BIFS language, the Layer2D node is described as follows:

Layer2D { exposedField MFNode children2D [] exposedField MFNodechildrenLayer [] exposedField SFVec2i size −1 −1 exposedField SFVec2itranslation 0 0 exposedField SFFloat depth 0 }

The children2D field can have, as value, any 2D grouping or childrennodes that defines a 2D scene. The childrenLayer field can take either a2D or 3D layer node as value. The ordering (layering) of the children ofa Layer2D node is explicitly given by the use of transform2D nodes. Iftwo 2D nodes are the children of a same Transform2D, the layering of 2Dnodes is done in the order of the children in the children field of theTransform2D.

The layering of the 2D and 3D layers is specified by the translation anddepth fields. The size parameter is given in floating point number, andmay be expressed in pixels, or between 0.0 and 1.0 in “graphics meters”,according to the context. The same goes for the translation parameter. Asize of −1 in one direction means that the Layer2D node is not specifiedin size in that direction, and that the viewer would decide the size ofthe rendering area.

All the 2D objects under a same Layer2D node form a single composedobject. This composed object is viewed by other objects as a singleobject. In other words, if a Layer2D node A is the parent of two objectsB and C layered one on top of the other, it will not be possible toinsert a new object D between B and C unless D is added as a children ofA.

(B) Layer3D Definition and Semantic

Similarly, the Layer3D node is defined as a grouping node. It defines anarea on the screen where 3D objects will be rendered. Three fields (orattributes) describe how this node will be rendered with respect toother objects : its size, its position and its depth. These fields maybe the origin or the destination of routes. They are thus exposedFields.This node may be the parent of other nodes of the same type (i.e.,Layer3D) or of a similar type (Layer2D). This may be described by amultiple value field of type node (NIFNode). Besides, this node may bethe parent of nodes representing 3D objects. This also may be describedby a multiple value field of type node (MFNode).

In the special case where several views of the same 2D world (or object)are needed. bindable nodes pose a problem because it is no longerpossible to say that only one of each may be active at the same time inthe whole application. However, only one of each may be active in eachLayer3D. This behavior requires that the Layer3D node has an exposedField for each of the bindable node.

In the BIFS language, the Layer 3D node is described as follows:

Layer3D { exposedField MFNode children3D [] exposedField MFNodechildrendLayer [] exposedField SFVec2f translation 0 0 xposedFieldSFInt32 depth 0 exposedField SFVec2f size −1 −1 exposedIn SFNodebackground NULL exposedIn SFNode fog NULL exposedIn SFNodenavigationInfo NULL exposedIn SFNode viewpoint NULL }

The children3D field can have as value any 3D grouping or children nodesthat define a 3D scene. The childrenLayer field can have either a 2D or3D layer as values. The layering of the 2D and 3D layers is specified bythe translation and depth fields. The translation field is expressed, asin the case of the Layer2D either in pixels or in “graphics meters”,between 0.0 and 1.0. The size parameter has the same semantic and unitsas in the Layer2D. A size of −1 in one direction means that the Layer3Dnode is not specified in size in that direction, and that the viewerwould decide the size of the rendering area. All bindable children nodesare used as exposedFields of the Layer3D node. At run-time, these fieldstake the value of the currently bound bindable children nodes for the 3Dscene that is a child of the Layer3D node. This will allow to set acurrent viewpoint for instance to a Layer3D, in response to some event,which cannot be achieved by a direct use of the set_bind eventln of theViewpoint nodes, since scenes can be shared between different layers.

In the case where a 3D scene is shared between several Layer3D, thebehaviour of the various Sensor nodes is defined as follows: a sensortriggers an event whenever the sensor is triggered in any of the Layer3Dthat contains it.

(C) Composite2DTexture Definition and Semantic

The composite2DTexture is a texture node as the VRML 2.0 Image Texturenode. However, it is defined as a grouping node. It may be the parent ofany 2D node. The texture represented by this node results from thecomposition of a 2D scene described in the children field.

In the BIFS language, the Composite2DTexture node is described asfollows

Composite2DTexture { exposedField MFNode children2D  [] exposedFieldSFVec2f size −1 −1 }

The children2D field of type MFNode is the list of 2D grouping andchildren nodes that define the 2D scene to be mapped onto the 3D object.The size field specifies the size of this map. The unis are the same asin the case of the Layer2D/3D. If left as default value, an undefinedsize will be used. This composite2DTexture node can only be used as atexture field of an Appearance node.

(D) Composite3DTexture Definition and Semantic

The composite3DTexture is a texture node as the VRML 2.0 ImageTexturenode. However, it is defined as a grouping node. It may be the parent ofany 3D node. The texture represented by this node results from thecomposition of a 3D scene described in the children field. As for theLayer3D node, the issue of bindable nodes is solved using exposedfields.

In the BIFS language, the Composite3DTexture node is described asfollows:

Composite3DTexture { exposedField MFNode children3D [] exposedFieldSFVec2f size −1 −1 exposedIn SFNode background NULL exposedIn SFNode fogNULL exposedIn SFNode navigationInfo NULL exposedIn SFNode viewpointNULL }

The children3D field of type MFNode is the list of 3D grouping andchildren nodes that define the 3D scene to be mapped onto the 3D object.The size field specifies the size in pixels of this map (if left asdefault value, an undefined size will be used). The four followingfields represent the current values of the bindable children nodes usedin the 3D scene. This Composite3DTexture node can only be used as atexture field of an Appearance node.

(E) CompositeMap Definition and Semantic

The CompositeMap node is a special case of the Composite2DTexture nodethat is represented in a rectangle of the z=0 plane of the localcoordinate system. This useful subset of a Composite2DTexture node willenable to deal efficiently with many simple cases of combined 2D and 3Dcomposition.

In the BIFS language, the ActiveMap node is described as follows:

CompositeMap { exposedField MFNode children2D  [] exposedField SFVec2isceneSize −1 −1 exposedField SFVec2f center 0 0 exposedField SFVec2fmapSize 1.0 1.0 }

The children2D field of type MFNode is the list of 2D grouping andchildren nodes that define the 2D scene to be mapped onto the 3D object.The sceneSize field specifies the size in pixels of the 2D compositedscene (if left as default value, an undefined size will be used). Thecenter field specifies the coordinate of the center of the Composite Mapin the xOy coordinate system. The mapSize field specifies the size inthe 3D space measure of the rectangle area where the 2D scene is to bemapped. This node can be used as any 3D children node.

(F) Valuator Definition and Semantic

The Valuator node is a node used to route a pre-defined value to a fieldof another node. It has an exposedField of each existing type. TheValuator is triggered whenever one of its exposedField is modified ormay be triggered through an eventln.

In the BIFS language, the Valuator node is described as follows:

Valuator { eventIn SFBool set_Active exposedField SFBool boolValue TRUEexposedField SFColor colorValue 0 0 0 exposedField SFFloat floatValue0.0 exposedField SFImage imageValue NULL exposedField SFInt32 intValue 0exposedField SFNode nodeValue NULL exposedField SFRotation rotationValue1 0 0 0 exposedField SFVec2f vec2fValue 0.0 0.0 exposedField SFVec3fvec3FValue 0.0 0.0 0.0 }

The semantic of the parameter is simply a constant value holder. Thisvalue can be routed to another field of the same type to be able to setvalues to fields explicity. The routing can be activated with theeventIn set_Active field.

The above-described solution solves the adressed problems. A singlerepresentation for a complete 2D/3D scene and a global interactivitywith 2D and 3D objects are indeed obtained, and since 2D and 3D objectsare now described in a same file (or stream), it is possible to use thesame routing mechanism between fields. An example of this functionality,in which, for a 3D scene composed of one cube and a color paletterepresented as 2D circles in a 2D scene, when the user touches a colorin this palette, the cube color is set to the touched color, is given inthe annex A.

Moreover, as shown in FIG. 1, the two nodes Layer2D and Layer3D havebeen designed to organize the scene in a single global hierarchy. Itmust also be indicated that 2D composited scenes as texture maps and 2DComposite maps are conceptually very similar. The Composite map definesa rectangular facet texture mapped with a 2D composited scene. The 2Dcomposited scene as texture map is a texture that may be mapped on anygeometry.

The annex B gives an example of a Composite map. In this example, onehas at the origin of the world a 2.0×4.0 rectangular region on theground composed of 2 images. The user may touch any of the 2 images totrigger an action (the actions are not specified in the example).

The annex C gives, for 3D composited scenes as texture maps, anotherexample of a Composite map. In this example, one has a cube in aLayer3D. This cube has a texture map that is composed of the renderingof a cylinder viewed from a specified viewpoint. The user may touch thecylinder to trigger an action (the action is not specified in theexample).

Concerning multiple views of a same scene, the proposed solution allowsa same scene to be displayed in several Layer3D from differentviewpoints. Besides, the viewpoint of this scene may be modified bytouching some 2D image. This functionality is shown in the example givenin the last annex D.

What is claimed is:
 1. A method of decoding coded digital signalsrepresentative of audiovisual data and available in the form of acontinuous bitstream in view of the binary description of a scene to berendered on a displaying device, said method comprising a processingoperation based on an evolutive syntactic language and including thesteps: extracting, from said bitstream, distinct elements called objectsaccording to the structure of said scene; defining an individualanimation of said elements of the scene; defining particularinteractions between a user and said elements; and organizing specificrelations between said scene elements and corresponding individualanimations and/or user interactions according to various classes ofapplications; wherein said processing operation further comprises anadditional step of describing a complex scene, built from any kind ofbidimensional and tridimensional objects, according to a frameworkintegrating both bidimensional and tridimensional features and unifyingthe composition and representation mechanisms of the scene structure,wherein said additional step comprises a first main sub-step of defininga hierarchical representation of said scene according to a treestructure organized both in grouping nodes, that indicate thehierarchical connections giving the spatial composition of the concernedscene, and in children nodes, that constitute the leaves of the tree;and a second auxiliary sub-step of defining transversal connectionsbetween any kind of nodes.
 2. A method according to claim 1, wherein thenodes of the tree structure comprise at least bidimensional objects andtridimensional objects, and the auxiliary sub-step comprises: a firststep of embedding at least one of said bidimensional objects within atleast one of said tridimensional objects; a second step of definingtransversal connections between said tridimensional and bidimensionalobjects; and a third step of controlling the second defining step for atleast one individual animation and/or at least one particularinteraction both in the embedded bidimensional object(s) and in thecorresponding original one(s).
 3. A method according to claim 1, whereinthe nodes of the tree structure comprise at least bidimensional andtridimensional objects, and the auxiliary sub-step comprises: a firststep of embedding at least one of said tridimensional objects within atleast one of said bidimensional objects; a second step of definingtransversal connections between said bidimensional and tridimensionalobjects; and a third step of controlling the second defining step for aleast one individual animation and/or at least one particularinteraction both in the embedded tridimensional object(s) and in thecorresponding original one(s).
 4. A method according to claim 1, whereinthe nodes of the tree structure comprise at least tridimensionalobjects, and the auxiliary sub-step comprises: a first step of embeddingat least one of said tridimensional objects within at least one ofanyone of said tridimensional objects; a second step of definingtransversal connections between said tridimensional objects; and a thirdstep of controlling the second defining step for at least one individualanimation and/or at least one particular interaction both in theembedded tridimensional object(s) and in the corresponding originalone(s).
 5. A method according to claim 3, wherein said auxiliarysub-step also comprises an additional step of controlling thesimultaneous rendering of at least one single tridimensional scene fromvarious viewpoints, while maintaining the third step of controlling thesecond defining step of the individual animation(s) and/or theparticular interaction(s).